Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations

نویسندگان

  • Steven Henikoff
  • Jorja G. Henikoff
  • Shmuel Pietrokovski
چکیده

MOTIVATION As databanks grow, sequence classification and prediction of function by searching protein family databases becomes increasingly valuable. The original Blocks Database, which contains ungapped multiple alignments for families documented in Prosite, can be searched to classify new sequences. However, Prosite is incomplete, and families from other databases are now available to expand coverage of the Blocks Database. RESULTS To take advantage of protein family information present in several existing compilations, we have used five databases to construct Blocks+, a unified database that is built on the PROTOMAT/BLOSUM scoring model and that can be searched using a single algorithm for consistent sequence classification. The LAMA blocks-versus-blocks searching program identifies overlapping protein families, making possible a non-redundant hierarchical compilation. Blocks+ consists of all blocks derived from PROSITE, blocks from Prints not present in PROSITE, blocks from Pfam-A not present in PROSITE or Prints, and so on for ProDom and Domo, for a total of 1995 protein families represented by 8909 blocks, doubling the coverage of the original Blocks Database. A challenge for any procedure aimed at non-redundancy is to retain related but distinct families while discarding those that are duplicates. We illustrate how using multiple compilations can minimize this potential problem by examining the SNF2 family of ATPases, which is detectably similar to distinct families of helicases and ATPases. AVAILABILITY http://blocks.fhcrc.org/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New features of the Blocks Database servers

Blocks are ungapped multiple sequence alignments representing conserved protein regions, and the Blocks Database consists of blocks from documented protein families. World Wide Web (http://www. blocks.fhcrc.org) and Email ([email protected]) servers provide tools for homology searching and for analyzing protein family relationships. New enhancements include a multiple alignment processor ...

متن کامل

A Transition Probability Model for Amino Acid Substitutions from Blocks

Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for pr...

متن کامل

Profile Clusters Derived from BLOCKS Suggest a Simple Model of Column Evolution in Multiple Alignments of Protein Families

BLOSUM and PAM series of protein substitution matrices are popular tools for scoring protein pairwise and multiple alignments. For protein multiple alignments there exists another representation of an evolving column, based on a set of predefined frequency profile patterns. For conserved sites, these profile patterns represent stationary points in a 20-dimensional profile space. There are 20 su...

متن کامل

Blocks-based methods for detecting protein homology.

The most highly conserved regions of proteins can be represented as blocks of aligned sequence segments, typically with multiple blocks for a given protein family. The Blocks Database World Wide Web (http://blocks.fhcrc.org) and e-mail (blocks@blocks. fhcrc.org) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments. We describe features for ...

متن کامل

Recent enhancements to the Blocks Database servers

The Blocks Database contains multiple alignments of conserved regions in protein families which can be searched by e-mail ([email protected]) and World Wide Web (http://blocks.fhcrc.org/ ) servers to classify protein and nucleotide sequences. Recent enhancements to the servers include: (i) improved calculation of position-specific scoring matrices from blocks; (ii) availability of the Pri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 15 6  شماره 

صفحات  -

تاریخ انتشار 1999